MPI and Embedded TCP/IP Gigabit Ethernet Cluster Computing
نویسنده
چکیده
A group of lower cost PC’s connected via Gigabit Ethernet and using MPI for communications between multiple parallel processes running simultaneously on all hosts provides a cost effective and powerful computing solution. The processing load for interprocess communications via TCP is significant when the parallel processes must exchange a large amount of data. When using a standard gigabit network interface card (NIC), TCP communications at near wire speed (above 800 Mbit/sec) use almost the entire processing capacity of a 1 GHz Pentium 3 processor or around 30% of a 2.4 GHz Pentium IV. This communications overhead significantly reduces the computational power of economical two processor systems. NICs that perform the protocol processing on the card offer the possibility of reducing this significantly. This study evaluates the performance and cost effectiveness of using a NIC with embedded TCP/IP processing to offload the network processing and allow more MPI processes per host. How might a NIC with embedded TCP/IP protocol processing (hereafter referred to as an "embedded NIC") improve the performance of the hosts within a computing cluster? Most importantly, it could eliminate the time that the host spends in the kernel processing each incoming packet and switching between kernel and user mode. It might also reduce memory load by directly moving the data to and from the user memory space to the interface without intervening copies. Finally, it might more quickly process the packets to decrease latency and increase the data transfer rate to the maximum allowed by the links’ Gigabit data rate. These benefits may allow each host to run another MPI process to either improve overall performance or allow equivalent performance with fewer hosts. It is this second option of reducing the number of hosts that will be used for the cost comparison. The first portion of the study focused upon measuring the processing load and performance of a system with standard NICs to determine the theoretical gains possible. The actual communications load of computational processes using MPI varies widely depending upon the amount of data that must be exchanged. Synthetic tests which maximized network load were used to determine worst-case loading and give an upper bound on the gains that might be obtained with an embedded NIC. Then embedded NICs from Alacritech were used to directly compare loading, data rate, and latency with the reference systems. These cards are still in beta stage development. They showed multiple inconsistencies and problems that might be expected for a new technology. These problems have limited some tests and reduced some performance measurements, so the final results are not conclusive. The embedded 1000BaseT NIC used in our study cost $795. To be economically justifiable the embedded NIC must provide performance improvements that allow a reduction in the number of hosts and network switch ports that at least offset this additional cost per host. Dual processor hosts with fast CPUs (2+ GHz P4) and larger amounts of memory (1+GB) are $3.5-4K per host with interface card. Wire interfaces and switches are far cheaper than the fiber optic versions; a smaller wire interface (24-48 ports) switch may cost $200 per port, a large switch still more. A first order comparison with the lowest per host costs shows that per host cost is $4500 with an embedded NIC vs. $3700 without, a ratio of 1.22 to 1. This implies that hosts with the embedded NIC must provide a 22% improvement in performance to allow the number of hosts to be cut to an equivalent total cost for constant performance. Can this be
منابع مشابه
USENIX Association Proceedings of the 4 th Annual Linux Showcase
We evaluate and compare the performance of LAM, MPICH, and MVICH on a Linux cluster connected by a Gigabit Ethernet network. Performance statistics are collected using NetPIPE which show the behavior of LAM/MPI and MPICH over a gigabit network. Since LAM and MPICH use the TCP/IP socket interface for communicating messages, it is critical to have high TCP/IP performance. Despite many efforts to ...
متن کاملPerformance Comparison of LAM/MPI, MPICH, and MVICH on a Linux Cluster Connected by a Gigabit Ethernet Network
We evaluate and compare the performance of LAM, MPICH, and MVICH on a Linux cluster connected by a Gigabit Ethernet network. Performance statistics are collected using NetPIPE which show the behavior of LAM/MPI and MPICH over a gigabit network. Since LAM and MPICH use the TCP/IP socket interface for communicating messages, it is critical to have high TCP/IP performance. Despite many efforts to ...
متن کاملUSENIX Association
We evaluate and compare the performance of LAM, MPICH, and MVICH on a Linux cluster connected by a Gigabit Ethernet network. Performance statistics are collected using NetPIPE which show the behavior of LAM/MPI and MPICH over a gigabit network. Since LAM and MPICH use the TCP/IP socket interface for communicating messages, it is critical to have high TCP/IP performance. Despite many e↵orts to i...
متن کاملFactors Involved in the Performance of Computations on Beowulf Clusters
Beowulf (PC) clusters represent a cost-effective platform for large scale scientific computations. In this paper, we discuss the effects of some possible configuration, hardware, and software choices on the communications latency and throughput attainable, and the consequent impact on scalability and performance of codes. We compare performance currently attainable using Gigabit Ethernet with t...
متن کاملPerformance Evaluation of Gigabit Ethernet and Myrinet for System-Area-Networks
Low latency and high bandwidth networking is essential for cluster computing and System-Area-Networks (SAN). The performance of a SAN optimized interconnect, Myrinet, is compared with gigabit Ethernet running TCP/IP. Though Myrinet has lower latencies and higher throughput than gigabit Ethernet, it is found that an efficient implementation of message passing interface library over TCP/IP achiev...
متن کاملDistributed Computing with the CLAN Network
CLAN (Collapsed LAN) is a high performance user-level network targeted at the server room. It presents a simple low-level interface to applications: connection-oriented non-coherent shared memory for data transfer, and Tripwire, a user-level programmable CAM for synchronisation. This simple interface is implemented using only hardware state machines on the NIC, yet is flexible enough to support...
متن کامل